Database Record Duplicate Detection System using Simil Algorithm
نویسندگان
چکیده
منابع مشابه
Chapter 2 Duplicate Record Detection Using Anfis
The problem of duplicate detection is to find out whether the same real-world object is represented by two or more distinct entries in the database. Duplicate detection is otherwise known as Record linkage or record matching. It is a greatly researched topic and is of vital importance in fields such as master data management, data warehousing and ETL (Extraction, Transformation and Loading), cu...
متن کاملChapter 3 Duplicate Record Detection Using Ga and Pso
The present chapter extends the research discussed in chapter 2 by handling the optimization algorithms. Moises G. de Carvalho et al (2011) have proposed a genetic programming approach to record deduplication. This approach automatically proposes duplicate record detection function by combining several pieces of evidence taken from the data. This function makes it possible to identify whether t...
متن کاملNear Duplicate Web Page Detection using NDupDet Algorithm
Web is a system of interlinked hypertext documents accessed via Internet. Internet is a global system of interconnected computer networks that serve billions of users worldwide. The huge amount of documents on the web is challenging for web search engines. Web contains multiple copies of the same content or same web page. Many of these pages on the Web are duplicates and near duplicates of othe...
متن کاملEffective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm
There is big amount of work on discovering duplicates in relational data; merely elite findings concentrate on duplication in additional multifaceted hierarchical structures. Electronic information is one of the key factors in several business operations, applications, and determinations, at the same time as an outcome, guarantee its superiority is necessary. Duplicates are several delegacy of ...
متن کاملPSO Algorithm to Select Subsets of Q-Gram Features for Record Duplicate Detection
Though data quality issues arise with ever-zooming quantity of data, it is a welcome sign that of late, significant improvement has been made in data engineering. Consequently, there have been significant investments from private and government organizations in developing methods for removing replicas from the data repositories. This phenomenon has caused a significant interest among researcher...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal on Computer Science and Engineering
سال: 2018
ISSN: 2229-5631,0975-3397
DOI: 10.21817/ijcse/2018/v10i2/181002013